22 research outputs found

    Secure kk-ish Nearest Neighbors Classifier

    Get PDF
    In machine learning, classifiers are used to predict a class of a given query based on an existing (classified) database. Given a database S of n d-dimensional points and a d-dimensional query q, the k-nearest neighbors (kNN) classifier assigns q with the majority class of its k nearest neighbors in S. In the secure version of kNN, S and q are owned by two different parties that do not want to share their data. Unfortunately, all known solutions for secure kNN either require a large communication complexity between the parties, or are very inefficient to run. In this work we present a classifier based on kNN, that can be implemented efficiently with homomorphic encryption (HE). The efficiency of our classifier comes from a relaxation we make on kNN, where we allow it to consider kappa nearest neighbors for kappa ~ k with some probability. We therefore call our classifier k-ish Nearest Neighbors (k-ish NN). The success probability of our solution depends on the distribution of the distances from q to S and increase as its statistical distance to Gaussian decrease. To implement our classifier we introduce the concept of double-blinded coin-toss. In a doubly-blinded coin-toss the success probability as well as the output of the toss are encrypted. We use this coin-toss to efficiently approximate the average and variance of the distances from q to S. We believe these two techniques may be of independent interest. When implemented with HE, the k-ish NN has a circuit depth that is independent of n, therefore making it scalable. We also implemented our classifier in an open source library based on HELib and tested it on a breast tumor database. The accuracy of our classifier (F_1 score) were 98\% and classification took less than 3 hours compared to (estimated) weeks in current HE implementations

    Secure Range-Searching Using Copy-And-Recurse

    Get PDF
    Range searching is the problem of preprocessing a set of points PP, such that given a query range γ\gamma we can efficiently compute some function f(Pγ)f(P\cap\gamma). For example, in a 1 dimensional range counting query, PP is a set of numbers, γ\gamma is a segment and we need to count how many numbers in PP are in γ\gamma. In higher dimensions, PP is a set of dd dimensional points and the query range is some volume in RdR^d. In general, we want to compute more than just counting, for example, the average of PγP\cap\gamma. Range searching has applications in databases where some SELECT queries can be translated to range searches. It had received a lot of attention in computational geometry where a data structure called partition tree was shown to solve range searching in time sub-linear in P|P| using only space linear in P|P|. In this paper we consider partition trees in a secure setting where we answer range queries without learning the value of the points or the parameters of the range. We show how partition trees can be securely traversed with O(n1+ϵ+tn11d+ϵ)O(n^{1+\epsilon} + t \cdot n^{1-\frac{1}{d}+\epsilon}) operations, where n=Pn=|P|, tt is the number of operations needed to compare to γ\gamma and ϵ>0\epsilon>0 is a parameter. As far as we know, this is the first non-trivial bound on range searching and it improves over the na\ ive solution that needs O(tn)O(t\cdot n) operations. Our algorithms are independent of the encryption scheme but as an example we implemented them using the CKKS FHE scheme. Our experiments show that for databases of sizes 2232^{23} and 2252^{25}, our algorithms run ×2.8\times 2.8 and ×4.7\times 4.7 (respectively) faster than the naive algorithm. The improvement of our algorithm comes from a method we call copy-and-recurse. With it we efficiently traverse a rr-ary tree (where each inner node has rr children) that also has the property that at most ξ\xi of them need to be recursed into when traversing the tree. We believe this method is interesting in its own and can be used to improve traversals in other tree-like structures

    Secure Search via Multi-Ring Fully Homomorphic Encryption

    Get PDF
    Secure search is the problem of securely retrieving from a database table (or any unsorted array) the records matching specified attributes, as in SQL ``SELECT...WHERE...\u27\u27 queries, but where the database and the query are encrypted. Secure search has been the leading example for practical applications of Fully Homomorphic Encryption (FHE) since Gentry\u27s seminal work in 2009, attaining the desired properties of a single-round low-communication protocol with semantic security for database and query (even during search). Nevertheless, the wide belief was that the high computational overhead of current FHE candidates is too prohibitive in practice for secure search solutions (except for the restricted case of searching for a uniquely identified record as in SQL UNIQUE constrain and Private Information Retrieval). This is due to the high degree in existing solutions that is proportional at least to the number of database records m. We present the first algorithm for secure search that is realized by a polynomial of logarithmic degree (log m)^c for a small constant c>0. We implemented our algorithm in an open source library based on HElib, and ran experiments on Amazon\u27s EC2 cloud with up to 100 processors. Our experiments show that we can securely search to retrieve database records in a rate of searching in millions of database records in less than an hour on a single machine. We achieve our result by: (1) Designing a novel sketch that returns the first strictly-positive entry in a (not necessarily sparse) array of non-negative real numbers; this sketch may be of independent interest. (2) Suggesting a multi-ring evaluation of FHE -- instead of a single ring as in prior works -- and leveraging this to achieve an exponential reduction in the degree

    BLEACH: Cleaning Errors in Discrete Computations over CKKS

    Get PDF
    Approximated homomorphic encryption (HE) schemes such as CKKS are commonly used to perform computations over encrypted real numbers. It is commonly assumed that these schemes are not “exact” and thus they cannot execute circuits with unbounded depth over discrete sets, such as binary or integer numbers, without error overflows. These circuits are usually executed using BGV and B/FV for integers and TFHE for binary numbers. This artificial separation can cause users to favor one scheme over another for a given computation, without even exploring other, perhaps better, options. We show that by treating step functions as “clean-up” utilities and by leveraging the SIMD capabilities of CKKS, we can extend the homomorphic encryption toolbox with efficient tools. These tools use CKKS to run unbounded circuits that operate over binary and small-integer elements and even combine these circuits with fixed-point real numbers circuits. We demonstrate the results using the Turing-complete Conway’s Game of Life. In our evaluation, for boards of size 128x128, these tools achieved an order of magnitude faster latency than previous implementations using other HE schemes. We argue and demonstrate that for large enough real-world inputs, performing binary circuits over CKKS, while considering it as an “exact” scheme, results in comparable or even better performance than using other schemes tailored for similar inputs

    Linear-Regression on Packed Encrypted Data in the Two-Server Model

    Get PDF
    Developing machine learning models from federated training data, containing many independent samples, is an important task that can significantly enhance the potential applicability and prediction power of learned models. Since single users, like hospitals or individual labs, typically collect data-sets that do not support accurate learning with high confidence, it is desirable to combine data from several users without compromising data privacy. In this paper, we develop a privacy-preserving solution for learning a linear regression model from data collectively contributed by several parties (``data owners\u27\u27). Our protocol is based on the protocol of Giacomelli et al. (ACNS 2018) that utilized two non colluding servers and Linearly Homomorphic Encryption (LHE) to learn regularized linear regression models. Our methods use a different LHE scheme that allows us to significantly reduce both the number and runtime of homomorphic operations, as well as the total runtime complexity. Another advantage of our protocol is that the underlying LHE scheme is based on a different (and post-quantum secure) security assumption than Giacomelli et al. Our approach leverages the Chinese Remainder Theorem, and Single Instruction Multiple Data representations, to obtain our improved performance. For a 1000 x 40 linear regression task we can learn a model in a total of 3 seconds for the homomorphic operations, compared to more than 100 seconds reported in the literature. Our approach also scales up to larger feature spaces: we implemented a system that can handle a 1000 x 100 linear regression task, investing minutes of server computing time after a more significant offline pre-processing by the data owners. We intend to incorporate our protocol and implementations into a comprehensive system that can handle secure federated learning at larger scales

    Efficient Privacy-Preserving Viral Strain Classification via k-mer Signatures and FHE

    Get PDF
    With the development of sequencing technologies, viral strain classification -- which is critical for many applications, including disease monitoring and control -- has become widely deployed. Typically, a lab (client) holds a viral sequence, and requests classification services from a centralized repository of labeled viral sequences (server). However, such ``classification as a service\u27\u27 raises privacy concerns. In this paper we propose a privacy-preserving viral strain classification protocol that allows the client to obtain classification services from the server, while maintaining complete privacy of the client\u27s viral strains. The privacy guarantee is against active servers, and the correctness guarantee is against passive ones. We implemented our protocol and performed extensive benchmarks, showing that it obtains almost perfect accuracy (99.8%99.8\%--100%100\%) and microAUC (0.9990.999), and high efficiency (amortized per-sequence client and server runtimes of 4.954.95ms and 0.530.53ms, respectively, and 0.210.21MB communication). In addition, we present an extension of our protocol that guarantees server privacy against passive clients, and provide an empirical evaluation showing that this extension provides the same high accuracy and microAUC, with amortized per sequences overhead of only a few milliseconds in client and server runtime, and 0.3MB in communication complexity. Along the way, we develop an enhanced packing technique in which two reals are packed in a single complex number, with support for homomorphic inner products of vectors of ciphertexts. We note that while similar packing techniques were used before, they only supported additions and multiplication by constants

    Privacy Preserving Feature Selection for Sparse Linear Regression

    Get PDF
    Privacy-Preserving Machine Learning (PPML) provides protocols for learning and statistical analysis of data that may be distributed amongst multiple data owners (e.g., hospitals that own proprietary healthcare data), while preserving data privacy. The PPML literature includes protocols for various learning methods, including ridge regression. Ridge regression controls the L2L_2 norm of the model, but does not aim to strictly reduce the number of non-zero coefficients, namely the L0L_0 norm of the model. Reducing the number of non-zero coefficients (a form of feature selection) is important for avoiding overfitting, and for reducing the cost of using learnt models in practice. In this work, we develop a first privacy-preserving protocol for sparse linear regression under L0L_0 constraints. The protocol addresses data contributed by several data owners (e.g., hospitals). Our protocol outsources the bulk of the computation to two non-colluding servers, using homomorphic encryption as a central tool. We provide a rigorous security proof for our protocol, where security is against semi-honest adversaries controlling any number of data owners and at most one server. We implemented our protocol, and evaluated performance with nearly a million samples and up to 40 features

    E2E near-standard and practical authenticated transciphering

    Get PDF
    Homomorphic encryption (HE) enables computation delegation to untrusted third-party while maintaining data confidentiality. Hybrid encryption (a.k.a Transciphering) allows a reduction in the number of ciphertexts and storage size, which makes HE solutions practical for a variety of modern applications. Still, modern transciphering has two main drawbacks: 1) lack of standardization or bad performance of symmetric decryption under FHE; 2) lack of input data integrity. In this paper, we discuss the concept of Authenticated Transciphering (AT), which like Authenticated Encryption (AE) provides some integrity guarantees for the transciphered data. For that, we report on the first implementations of AES-GCM decryption and Ascon decryption under CKKS. Moreover, we report and demonstrate the first end-to-end process that uses transciphering for real-world applications i.e., running deep neural network inference (ResNet50 over ImageNet) under encryption

    HeLayers: A Tile Tensors Framework for Large Neural Networks on Encrypted Data

    Full text link
    Privacy-preserving solutions enable companies to offload confidential data to third-party services while fulfilling their government regulations. To accomplish this, they leverage various cryptographic techniques such as Homomorphic Encryption (HE), which allows performing computation on encrypted data. Most HE schemes work in a SIMD fashion, and the data packing method can dramatically affect the running time and memory costs. Finding a packing method that leads to an optimal performant implementation is a hard task. We present a simple and intuitive framework that abstracts the packing decision for the user. We explain its underlying data structures and optimizer, and propose a novel algorithm for performing 2D convolution operations. We used this framework to implement an HE-friendly version of AlexNet, which runs in three minutes, several orders of magnitude faster than other state-of-the-art solutions that only use HE.Comment: 17 pages, 7 figure
    corecore